Experimental Results from a Random Forest Decision Tree Ensemble based NER model

Random Forest Decision Tree Ensembles build a set of Decision Trees, each on a random subset of a given data set, that return a combined classification decision.

This report will provide precision, recall, and f-measure values for Decision Trees built on the orthographic; orthograhic + morphological; orthographic + morphological + lexical feature sets for the Adverse Reaction, Indication, Active Ingredient, and Inactive Ingredient entities. A viewable Decision Tree structure will also be generated for each fold.


The file 'randomforest.py' builds a Random Forest Ensemble classifier on the sparse format ARFF file passed in as a parameter. This file is saved in the models directory with the format 'randomforest_[featuresets]_[entity name].pkl'
The file 'evaluate_randomforest.py' evaluates a given Random Forest Ensemble model stored inside a '.pkl' file outputing appropriate statistics and saving a pdf image of the underlying decision structure associated with the given model.

All ARFF files were cleaned with 'arff_translator.py'. This cleaning consisted of removing a comma from each instance that was mistakenly inserted during file creation.

python3 arff_translator.py [filename]

Adverse Reaction Feature Set

Orthographic Features


In [1]:
import subprocess

""" Creates models for each fold and runs evaluation with results """
featureset = "o"
entity_name = "adversereaction"

for fold in range(1,1): #training has already been done
    training_data = "../ARFF_Files/%s_ARFF/_%s/_train/%s_train-%i.arff" % (entity_name, featureset, entity_name, fold)
    os.system("python3 decisiontree.py -tr %s" % (training_data))


for fold in range(1,11):
    testing_data = "../ARFF_Files/%s_ARFF/_%s/_test/%s_test-%i.arff" % (entity_name, featureset, entity_name, fold)
    output = subprocess.check_output("python3 evaluate_randomforest.py -te %s" % (testing_data), shell=True)
    print(output.decode('utf-8'))


adversereaction_test-1.arff
Precision: 0.961538
Recall: 0.013789
[[   25  1788]
 [    1 16927]]


adversereaction_test-2.arff
Precision: 0.750000
Recall: 0.008167
[[    9  1093]
 [    3 19878]]


adversereaction_test-3.arff
Precision: 0.333333
Recall: 0.001961
[[    1   509]
 [    2 10642]]


adversereaction_test-4.arff
Precision: 1.000000
Recall: 0.008540
[[   10  1161]
 [    0 10655]]


adversereaction_test-5.arff
Precision: 0.571429
Recall: 0.010852
[[   20  1823]
 [   15 18196]]


adversereaction_test-6.arff
Precision: 0.166667
Recall: 0.002210
[[    2   903]
 [   10 13178]]


adversereaction_test-7.arff
Precision: 0.800000
Recall: 0.006098
[[    4   652]
 [    1 18655]]


adversereaction_test-8.arff
Precision: 0.708333
Recall: 0.020118
[[   17   828]
 [    7 15856]]


adversereaction_test-9.arff
Precision: 0.500000
Recall: 0.001765
[[   2 1131]
 [   2 8715]]


adversereaction_test-10.arff
Precision: 0.538462
Recall: 0.006261
[[    7  1111]
 [    6 15010]]


Orthographic features alone contribute to a relatively high precision but very low recall. This implies that orthographic features alone are not enough to carve out the decision boundary for all the positive instances hence the low recall.However,the decision boundary created is very selective as indicated by the high precision.

Orthographic + Morphological Features


In [2]:
import subprocess

""" Creates models for each fold and runs evaluation with results """
featureset = "om"
entity_name = "adversereaction"

for fold in range(1,1): #training has already been done
    training_data = "../ARFF_Files/%s_ARFF/_%s/_train/%s_train-%i.arff" % (entity_name, featureset, entity_name, fold)
    os.system("python3 decisiontree.py -tr %s" % (training_data))


for fold in range(1,11):
    testing_data = "../ARFF_Files/%s_ARFF/_%s/_test/%s_test-%i.arff" % (entity_name, featureset, entity_name, fold)
    output = subprocess.check_output("python3 evaluate_randomforest.py -te %s" % (testing_data), shell=True)
    print(output.decode('utf-8'))


adversereaction_test-1.arff
Precision: 0.823755
Recall: 0.474352
[[  860   953]
 [  184 16744]]


adversereaction_test-2.arff
Precision: 0.502982
Recall: 0.459165
[[  506   596]
 [  500 19381]]


adversereaction_test-3.arff
Precision: 0.566845
Recall: 0.415686
[[  212   298]
 [  162 10482]]


adversereaction_test-4.arff
Precision: 0.800781
Recall: 0.525192
[[  615   556]
 [  153 10502]]


adversereaction_test-5.arff
Precision: 0.773810
Recall: 0.423223
[[  780  1063]
 [  228 17983]]


adversereaction_test-6.arff
Precision: 0.618081
Recall: 0.370166
[[  335   570]
 [  207 12981]]


adversereaction_test-7.arff
Precision: 0.423622
Recall: 0.410061
[[  269   387]
 [  366 18290]]


adversereaction_test-8.arff
Precision: 0.525606
Recall: 0.461538
[[  390   455]
 [  352 15511]]


adversereaction_test-9.arff
Precision: 0.803571
Recall: 0.476611
[[ 540  593]
 [ 132 8585]]


adversereaction_test-10.arff
Precision: 0.756426
Recall: 0.552773
[[  618   500]
 [  199 14817]]


It appears adding in the morphological features greatly increased classifier performance.
Below, find the underlying decision tree structure representing the classifier.

Orthographic + Morphological + Lexical Features


In [1]:
import subprocess

""" Creates models for each fold and runs evaluation with results """
featureset = "omt"
entity_name = "adversereaction"

for fold in range(1,1): #training has already been done
    training_data = "../ARFF_Files/%s_ARFF/_%s/_train/%s_train-%i.arff" % (entity_name, featureset, entity_name, fold)
    os.system("python3 decisiontree.py -tr %s" % (training_data))


for fold in range(1,11):
    testing_data = "../ARFF_Files/%s_ARFF/_%s/_test/%s_test-%i.arff" % (entity_name, featureset, entity_name, fold)
    output = subprocess.check_output("python3 evaluate_randomforest.py -te %s" % (testing_data), shell=True)
    print(output.decode('utf-8'))


adversereaction_test-1.arff
Precision: 0.838754
Recall: 0.668505
[[ 1212   601]
 [  233 16695]]


adversereaction_test-2.arff
Precision: 0.537289
Recall: 0.607985
[[  670   432]
 [  577 19304]]


adversereaction_test-3.arff
Precision: 0.723913
Recall: 0.652941
[[  333   177]
 [  127 10517]]


adversereaction_test-4.arff
Precision: 0.826695
Recall: 0.655850
[[  768   403]
 [  161 10494]]


adversereaction_test-5.arff
Precision: 0.752941
Recall: 0.520890
[[  960   883]
 [  315 17896]]


adversereaction_test-6.arff
Precision: 0.746367
Recall: 0.624309
[[  565   340]
 [  192 12996]]


adversereaction_test-7.arff
Precision: 0.557647
Recall: 0.722561
[[  474   182]
 [  376 18280]]


adversereaction_test-8.arff
Precision: 0.523636
Recall: 0.681657
[[  576   269]
 [  524 15339]]


adversereaction_test-9.arff
Precision: 0.855781
Recall: 0.633716
[[ 718  415]
 [ 121 8596]]


adversereaction_test-10.arff
Precision: 0.772600
Recall: 0.741503
[[  829   289]
 [  244 14772]]



In [ ]: